\section{introduction}

	With the explosive growth of intelligent surveillance video cameras
%\cite{Haering:2008}
, event data recorder and smart phone with cameras. This leads to the need of automatically content analysis of the huge amount of video. The core problem among the various applications such as surveillance event recording, crowd flow analysis, traffic control, elderly care, and intelligent transportation system,  is the detection of human figures. 
%\begin{figure}[htbp]
%	%\centering
%	\includegraphics[bb=0 0 80 30]{blur_all.png}
%\caption{The images are often blurred and noisy in low quality video}
%\end{figure}
\begin{figure}[htb]

\begin{minipage}[b]{1.0\linewidth}
 \centering
 \centerline{\includegraphics[scale=0.3]{blur}}
%  \vspace{2.0cm}
\end{minipage}
\caption{Examples of blurred, deformed and noisy images}
\label{fig:blur}
%%
\end{figure} 
 	People detection, which is consider a very challenging problem in computer vision. The difficulty lies in modelling person, which can be refer to two problems. First,  the variability in human appearance(clothing, articulated body parts, pose), point of view due to camera position, and interaction with other objects or human increase the difficulty. Second, in many real world human detection scenario. It becomes more complicated when the background varies under different light condition, high density of people and multiple occlusions in public area such as stadium or station.   
	
 	The nature of cameras vary from place to place. The quality of video includes resolution, noise, blur, and different encoding algorithms which is to satisfy the constraint in communication and disk storage. The quality is often not robust and in some scenarios the condition is worse(fig.\ref{fig:blur}). The lack of enough low level appearance feature lowers its discriminative ability when most of the existed techniques rely strongly on the visual-based information. The author in \cite{Dollar:2012je} suggest the consideration of adding motion information into the features of people detection.
 \begin{figure}[htb] 
\begin{minipage}[b]{1.0\linewidth}
 \centering
 \centerline{\includegraphics[width=8.0cm]{architecture_1}}
%  \vspace{2.0cm}
\end{minipage}
\caption{System of the HOOF/HOG Human Detector}
\label{fig:architecture}
%%
\end{figure} 
 
	The most representative histogram-based feature is HOG\cite{Dalal:2006kz}, which gain success in the field of people detection. In this paper we will introduce another feature, the integral Histogram of Oriented Optical Flow(figure \ref{fig:architecture}(a)). The architecture of the our histogram based head detection is at fig.\ref{fig:architecture}. The benefit of HOOF is: first it shares the same architecture of HOG which make it a reusable component in the favor of hardware design. Second,  the quality of motion raw data is scale-invariable and won't degrade as much as visual data when the condition is bad in some real world cases. It can complement the detector that is lack of enough visual information. Third, the pattern of optical flow from an individual looks similar in each body part connected from joints. Last, HOOF is not only used as feature to detect people, we employ it to segment moving objects and connect the salient moving part into a larger region of interest to reduce the computation cost in sliding window searching.    

	In most surveillance video, special events bring large amount of crowd, and cause the notorious occlusion problem which many methods have failed
%\cite{Zeng:2010PR}. 
Some part of the body might be occluded but the head and shoulder might be the last part being occluded and remains recognized by human\cite{He:2011SP}. Heads and shoulders are also not flexible as other part of body, and the HOOF looks similar in different direction. In HOOF, because the feature is not influenced by object which changes in its perspective or deforms, it becomes more robust than using HOG alone.

	Early approaches detect moving person by applying background extraction methods based on change detection of motion feature\cite{grabner:2007}. The more precise the foreground is extracted, the accurate the model of human object will be. However, recent trends of object detection suggests exploring visual gist or attention\cite{bileschi:2008}. The visual context includes the surrounding of object, the texture within the object, and the appearance information. Therefore, we are interested in finding regions that have both object and enough background to solve the problem. In \cite{Dalal:2006kz}, they also suggest to leave enough surrounding to emphasize the shape describe by the gradients. The same idea applies to our HOOF,  the relative movement between its local pixels region cannot be identified if no enough background information is given. In segmentation, we apply connected component labelling on global HOOF cells map and merge the small components together. 
%Then the sliding window will be able to search these interest areas using HOG and the HOOF together. 	

	The remainder of the paper is organized as follows: section \ref{review} is the review of the recent works. Section \ref{architecture} present the detail description of our system and introduce how to extract HOOF. Section\ref{experiment}describe the data set we use, and the experimental verification of the proposed features. Finally, conclusion is given in the \ref{conclusion}. 
